Preamble

This worksheet contains material for an introductory QGIS course held at the Cathie Marsh Institute for Social Research on 26 February 2020. All material is available on GitHub.

Background

The aim of this short course is to get you started using QGIS as a (completely free!) software for creating, manipulating, exploring and visualising spatial data. As we will see, QGIS is a powerful tool for generating aesthetically pleasing maps, but it is also invaluable for conducting analysis and substantive exploratory research. The material we cover today will equip you with the skills necessary to begin visualising and analysing your own data. Once you’re comfortable with the skills introduced in this worksheet, please feel free to put these into practice using your own data, or data downloaded from the data resources section of this worksheet.

You can consider today to be somewhat of a ‘crash course’ in QGIS. As such, it’s worthwhile remembering that the materials we cover today form part of a much wider field commonly known as ‘Geographic Information Science’ (GIS). If you are interested in exploring this field more, there are some recommended books in the further reading at the bottom of this page. That said, the information provided in this course will provide you with more than enough information get started exploring spatial data and making maps.

Why use QGIS?

QGIS is an open-source piece of software. This means, for one thing, that it is free, which represents a significant advantage over comparable software like ArcGIS for students and institutions (understandably) unwilling to fork out for licence fees. But that’s just a practical advantage: open-source also means transparency, continuous development and a supportive community of developers and users. QGIS is a key part of a wider, growing movement towards open-source software in geospatial analysis, with tools like GeoDa and GIS functionality in R becoming increasingly popular. A key benefit of QGIS being open-source is that it is constantly evolving, with lots of smart people continuously contributing to new versions and plug-ins to expand its capability.

QGIS is part of a wider open-soure movement

QGIS is part of a wider open-soure movement


You will also find a wealth of documentation and resources online, largely generated by the developers and users themselves. Websites like StackExchange are full of people willing to answer your questions, and I guarantee you that most queries you have will have already been answered somewhere! To date, nearly thirsty thousand questions have asked about QGIS. The friendly online community of QGIS developers and users is possibly the most extensive resource out there!

Spatial data

Before we get to grips with the software itself, let’s cover some preliminary basics, beginning with spatial data types. The diversity of topics in geographical research has motivated the collection of an enormous array of information which can be quantified for use in software like QGIS. Data collected for making maps will inherently include some spatial component, describing the location of an entity in space. It might also incorporate attribute data: non-spatial characteristics which describe entities. There are numerous ways in which spatial data can be stored and used in GIS software, including QGIS, but the most common data types are the vector and raster.

Vector data

Vector data represent features in the real world through points, lines and polygons. Standing on top of a skyscraper, overlooking a city, one will observe buildings, parks, street lights and roads, each comprising discrete features of the urban landscape. Vector data is comprised of vertices, which define the geometry of these features. The simplest geometric form is a two-dimensional vertex, a single X (longitude) and Y (latitude) coordinate describing a specific point location. When vertices are connected in order, with different start and end points, a line is formed. Lines with equal start and end points, with at least three vertices, represent polygons. In our urban landscape, points might be used to represent street lights, lines to represent roads, and polygons to represent buildings. Of course, a great deal of spatial data defined objects which do not physically exist on the ground, such as electoral wards or neighbourhood boundaries. So, these vertices collectively describe objects in space, and the attributes describe these objects. Given its popularity in social science research, we will focus on vector data throughout today.

Vector data. Source: [Data Carpentry](https://datacarpentry.org/organization-geospatial/02-intro-vector-data/) via the National Ecological Observatory Network (NEON)

Vector data. Source: Data Carpentry via the National Ecological Observatory Network (NEON)


Raster data

In some circumstances, vector data is unsuitable. Looking down from our skyscraper, one might also observe variation in air pollution across the city. This cannot easily or intuitively be represented using vector geometries such as lines or polygons. Air pollution might vary considerably within streets or parks, and consequently, attribute data associated with lines or polygons would mask a great deal of information. In such circumstances, raster data may be able to represent the real world more accurately than vector data. Rasters are comprised of a regular grid of cells, each of which contain associated attribute data, and can be used to represent continuous spatial information such as air pollution or remote sensing imagery of the Earth’s surface. The most common usage of raster data you might have come across are meteorological maps, such as those used in weather reports. Data about regionwide precipitation or temperature, for instance, is often stored in raster format. As noted earlier, today we are going to focus on vector data, but if you’d like to explore raster data examples, please feel free to explore the raster-specific resources at the end of this worksheet, or give me a shout!

Projections

Representing earth

As we noted earlier, maps are representations of the real world. Importantly, these representations tend to be created on a flat surface (a computer screen or piece of paper) even though the earth itself is more-or-less spherical. In an attempt to portray spatial entities, whether it be crime locations or any other phenomena, on a flat surface, we perform a transformation known as a ‘projection’. This is quite the mathematical challenge, and can be carried out in countless different ways, each of which have their own advantages and disadvantages. For instance, until recently, Google Maps used a projection known as the Mercator projection, which whilst useful for navigational purposes, also distorts the earth in a manner which makes land masses near the equator, such as Africa, appear much smaller than they actually are, and land masses near the poles, such as Greenland, much larger. For a light-hearted look at different projections of the world map, I recommend this blog post.


Source: [Brilliant Maps](https://brilliantmaps.com/xkcd/)

Source: Brilliant Maps


Coordinate Reference Systems

When working within GIS software like QGIS, we are subject to the same restrictions, since we are representing real-world information on a flat computer screen. Any spatial information you are using in QGIS, whether it be tram stop locations, neighbourhood boundaries or river formations, must have an associated Coordinate Reference System (CRS). This ensures that we know how our 2D projected maps relate to the actual features on our (pretty much) spherical earth. You are probably vaguely familiar with the most common type of CRS already, although perhaps not by name, known as a Geographic Coordinate Reference System, because it uses latitude and longitude coordinates to define specific points on the earth’s surface. It is more formally known as WGS 84. You might have even noticed that when you select a point in Google Maps, it automatically brings up the latitude and longitude coordinates of that location in a white box at the bottom of the window. It is through this system that we can relate my point-and-click to a real place on earth.


An example of latitude and longitude coordinates on the Google Maps online platform

An example of latitude and longitude coordinates on the Google Maps online platform


As we’ll find out later today, not all data you have collected or downloaded will use latitude and longitude coordinates. For example, lots of data released in Britain uses a projected CRS called the British National Grid, which uses Eastings and Northings to define locations in the British Isles based on a grid system, rather than longitude and latitude. It has some advantages, such as preserving shapes, and one can accurately calculate direction using the BNG. In fact, many areas of the world have their own projected CRS for similar reasons. It is beyond the scope of this course (and indeed, many GIS users) to discuss the merits and shortcomings of different CRS in detail. However, it is important to be aware of the CRS associated with your data, and to ensure that you are using the most appropriate one. Doing so will ensure that you are displaying information accurately, especially when overlaying multiple data sources. We went through this practically during the live demonstration of QGIS, but it will be covered again during the exercises later in this worksheet, using both WSG 84 and the BNG.

If you want to read more about projections and CRS, you can read the excellent QGIS documentation online. There are also some useful resources made available by Data Carpentry. Alternatively, please feel free to ask me!


British National Grid nested grids. Souce: [Ordnance Survey](https://getoutside.ordnancesurvey.co.uk/guides/beginners-guide-to-grid-references/)

British National Grid nested grids. Souce: Ordnance Survey


Now we are familiar with some GIS fundamentals, we can move on to opening up QGIS and exploring the interface.

QGIS interface

Quick tour

For these tutorials we will be using QGIS version 3.6.3 in order to match what is in the computer lab. If you are using your own laptop, you might have a different version, but it shouldn’t make too much difference. All previous releases of QGIS can be downloaded retrospectively from their website if you want the exact same version.

Basic QGIS interface

Basic QGIS interface


When you start-up QGIS, an interface resembling the above screenshot should open. There might be slight differences depending on whether someone has used QGIS on your laptop or computer before. The main window is the map view which visualises any spatial data you create or load into the software, so for now, it’s completely blank. On the left you will have the layers window, which provides a summary of the different “layers” of spatial data you are using. As we saw earlier, one of the most useful functions of QGIS is the ability to overlay different spatial data sources from the same area on top of one another. This window can be used to deselect layers, and change things like transparency to aid exploration of multiple layers simultaneously. At the top of the interface are the various tools available. Some of these have dedicated tab icons, but most functionality is available through the drop-down menus which we’ll explore later. The bottom of the interface includes live information about your map view, and importantly tells you what CRS is currently being used. In this case, our default is EPSG 4326 which is the official registry code for WGS 84, introduced above.

Plug-ins

A key benefit of open-source software like QGIS is the continuous development to functionality. One way in which QGIS benefits from this is through plug-ins which can be installed directly from within the software. Plug-ins have largely been developed by the QGIS community, and for that reason, they are often updated frequently with new tools and options. Before we get going with some data, we are going to install a plug-in called QuickMapServices which allows you to overlay Open Street Map base maps to your visualisations. First, navigate to the Plug-ins installation menu via the drop-down menu, search for the plug-in and install it, as demonstrated below.

<br> Step 1: Find the 'Manage and Install' option from the _Plug-ins_ drop-down menu


Step 1: Find the ‘Manage and Install’ option from the Plug-ins drop-down menu


<br> Step 2: Search for Quick Map Services and click 'Install Plugin'


Step 2: Search for Quick Map Services and click ‘Install Plugin’

The plug-in will now become available under the Web drop-down menu. Don’t worry about using it yet, we will get into that in a minute! Let’s move on to our first exercise.

Exercise 1: tram stops

Raw rata

To demonstrate some of the functionality in QGIS, we are going to use some data about tram stops on Greater Manchester’s Metrolink service. The data was compiled from some open government data and some information about facilities available at each tram stop. Start by downloading this data directly as a .csv file and saving it in a folder on your machine. Explore it using Excel. It will look something like this:

<br> Data structure for trams_geo.csv


Data structure for trams_geo.csv

Each row is an observation, in this case, a tram stop in Greater Manchester, of which there are 93. Each column is a variable giving us additional information about each stop. These variables contain a fair bit of information, from the tram stop name, to the line its on, the number of cycle stands, blue badge parking spaces and whether it has lift access, and so on. Most importantly for us, there are two variables called eastings and northings respectively. So, we have the spatial location of each tram stop in the projected CRS of British National Grid. But although these coordinates are spatial information, telling us where each tram stop is located on the earth’s surface, Excel is just treating them like any other numeric variable. Using QGIS, we can convert this boring old spreadsheet into spatial data. Hurray!

Creating point data

To make this conversion, we are going to add a new layer to our project in QGIS. We can do this using a specific option designed to pull out coordinates from a delimited text file using Layer -> Add Layer -> Add Delimited Text Layer on the drop-down menus.

<br> Adding a layer from a .csv file


Adding a layer from a .csv file

Bringing up this box will give you a series of options. First, we need to select the .csv file itself using the File name box by finding the file location on our local machine. Doing this will automatically fill in most of the remaining options and bring up a summary of how QGIS has read in the data, identiying the rows and columns. Often, you will have to manually select which columns represent which coordinates, and you will need to specify the CRS. There is a good chance that QGIS has actually done this for you. If not, we know that our coordinate columns are eastings (X field) and northings (Y field). We also know that, given that we have easting-northings columns on locations in Britain, that the CRS will be the BNG with an EPSG code of 27700.

<br> Completing information needed to create spatial points from a .csv file


Completing information needed to create spatial points from a .csv file

Once you’ve completed this information, click Add and close the window, and there we have it! You are viewing the point locations of tram stops in Greater Manchester. You can navigate around this data by scrolling and click+dragging your mouse. The Layer window now contains trams_geo. Just to make more sense of our introduction earlier, it’s worth mentioning that these points are vector data point, a very common way of representing specific pinpoint locations in QGIS. Each point has retained its attribute data contained in the original .csv file. You can view it by clicking on the table symbol in the toolbar at the top of your interface. You’ll notice that this table is exactly the same as our original spreadsheet.

<br> Attribute table icon


Attribute table icon

Preliminary exploration

A good way to begin exploring data like this, either for interest or to identify interesting patterns, is by using the Properties... option within the Layer window. You can access this option by right clicking on the name of the layer itself, in this case, trams_geo. It will bring up a window with lots of options down the left hand side, including basic information about the layer itself (e.g. the CRS), but also options to add information to your map using symbology and labels. There are endless options with these properties, many of which we’ll cover today, but for starters, let’s add some labels to our points so we can identify what is what. A basic single label will display the stop variable (i.e. the name of the stop) for each point, as shown in the below screenshot. Feel free to make amendments to the text font, style and size as you see fit.

<br> Adding labels to points


Adding labels to points


Once you click Apply the labels will be added in the map view window. Now we can actually see which point corresponds to which tram stop.

<br> Map view of points with labels


Map view of points with labels


In a similar manner to how we have just linked the stop variable to each point to display information, we can use these preferences to change other visual features. An accessible and simple way to portray information about each point is to colour them by a variable using symbology. Let’s colour each point according to the line on which the stop is situated. Because the variable containing this information line is discrete (i.e. categorical) we will replace the current basic single symbol with the Categorized option.


<br> Choosing a categorized symbology


Choosing a categorized symbology

We can then select the Column we are interested in, which in this case is line, from the drop-down menu, and change colour ramp to apply to each category. Because lines are discrete and don’t have any inherent order, I will just keep it as random colours. Next, click Classify to generate the categories. Often, QGIS will by default create an ‘other’ category for observations which do not fall into any category (e.g. missings). You can remove this category by highlighting it, and clicking on the red minus sign. If you don’t like the colours created at random, or they are too similar to one another, you can right-click on each category and select Change Color.


<br> Colouring each point by the line variable, and creating a discrete categorisation. You can change each colour by hand.


Colouring each point by the line variable, and creating a discrete categorisation. You can change each colour by hand.


As before, clicking on Apply will you make these change in our map view. It now contains information on which stop is which, and makes a distinction between the different lines.

<br> Updated map view with colours points by the line variable


Updated map view with colours points by the line variable


The possibilities of symbology preferences are pretty expansive. We can visualise a continuous variable using the Graduated option (instead of Categorized, used above). For this example, let’s use the bb_spaces variable, which tells us how many blue badge parking spaces are available at each tram stop. Doing so will give us an indication as to which tram stations have more or less spaces, but it will also tell us whether there is a meaningful geographic distribution to these patterns. Have a go at this now, using the below example as a guide, making amendments to things like the number of classes, as you find appropriate.

<br> Changing our symbology preferences to size points according to the number of blue badge parking spaces


Changing our symbology preferences to size points according to the number of blue badge parking spaces


We can see that, not only is their variation between tram stops in the number of blue badge parking spaces, but there is a spatial patterning to these distributions, with the city centre having few spaces, and stops near the end of lines having many. It also helps us spot potential issues. It is unlikely that Manchester Airport has no blue badge spaces nearby, so such visual explorations can help us identify areas which demand further explanation. For example, perhaps the spaces are not near the tram stop itself, or are not free.

<br> Updated map view with points sized according to the number of spaces


Updated map view with points sized according to the number of spaces


To give a bit of local context to these maps, we can make use of the Quick Map Services plug-in we downloaded earlier to add an Open Street Map layer to our project. We can do this by selecting Web -> QuickMapServices -> OSM -> OSM Standard.

<br> Loading an Open Street Map layer


Loading an Open Street Map layer


We can alter the appearence of this layer using the preferences symbology. For the below map, the Open Street Map (OSM) layer has been made grayscale and the brightness is quite high, so that our tram stop points stand out. The label names have now been turned off because the OSM layer is now giving the context. We can now interactively explore the map to find explanations for the patterns observed, or perhaps to identify areas where city planners could improve accessibility for blue badge holders at tram stops.

<br> Updated map view with points sized according to the number of spaces, with an OSM layer


Updated map view with points sized according to the number of spaces, with an OSM layer


Spend some time trying out different labelling and symbology options on different variables to answer your own research questions. How do different tram stops fair when it comes to other forms of accessibility, such as ramps? How might we best visualise this?

Once you’ve had a good exploration of the tram data, feel free to move on to the next exercise using police recorded crime data.

Exercise 2: crime maps

Raw data

To practice some of what we’ve learnt so far, and to learn some new skills, we’re going to be using some police recorded crime data in Greater Manchester. The data was compiled from an open data portal and contains information about anti-social behaviour incidents recorded by police in Greater Manchester during 2017. Download this data now and save it to a folder one your local machine.

Opening it up in Excel, you will notice that we have the latitude and longitude coordinates of these incidents, the Lower Super Output Area (LSOA) code in which the incident occurred, the crime type (ASB) and the month. So again, we have a standard spreadsheet containing spatial information, but it is not being treated as such in Excel. Based on the tram stop data example, you now know how to make this data spatial using the Layer -> Add Layer -> Add Delimited Text Layer drop-down menu. This time, however, note that we have latitude and longitude coordinates, not easting and northings. This should be setting off alarm bells: you will need to project this data using the ol’ WGS 84 (EPSG 4326) rather than the British National Grid. Remember that latitude is on the Y-axis, and longitude is on the X-axis. Have a go at getting this .csv file into QGIS now, turning those coordinates into vector points, just as we did earlier for tram stops. You should end up with a map something like the one below.

<br> Incident locations of Anti-social behaviour in Greater Machester, 2017


Incident locations of Anti-social behaviour in Greater Machester, 2017

For some of you, the above map might look a bit squished, and you will intuitively know that it does not look right. This could be for a number of reasons. The most common one (for me, anyway) is because the latitude and longitude coordinates are the wrong way around! Another common reason is because that QGIS is automatically trying to project your data ‘on the fly’. It does this to help you, and to ensure that it’s difficult to work with multiple layers of data in the wrong projection, but it can also be a bit confusing. You can see in the above screenshot, for example, that my QGIS is actually using the BNG (EPSG 27700), stated in the bottom right of the interface. It’s like that because I just finished working with the tram data, which was BNG. You can change the projection of your whole project by clicking on this tab and selecting (in this case) between WSG 84 and BNG as appropriate. If your map looks a bit dodgy, try doing this to resolve it. A good way check what you’ve done is by using the Open Street Map layer again. If you are working with data from the British Isles, I would recommend sticking the BNG.

Expressions

We have lots of data here, perhaps a bit too much. If we were only interested in one issue, let’s say ASB incidents that occurred in January, we can subset our data according to a condition within QGIS. You can do this within our attribute table. Open your attribute table up as we did earlier, using the tab at the top of the interface. This will open up a table, and we can see we have over 80,000 rows (i.e. observations, ASB incidents, points) in our data. There are a number of useful tabs at the top of this window. We can select specific rows, and then zoom to them on our map, for example. What we are interested in is Select features using an expression. This brings up a rather unfriendly looking window, but it is a powerful one. You don’t need to be completely familiar with how these expressions work, but it’s useful to know some basic syntax. For example, to only select ASB incidents which occurred in January, we can write "month" = '2007-01' where month is the variable and 2007-01 is the category we want.

<br> Subset your data by those incidents that occurred in January


Subset your data by those incidents that occurred in January


When you now click on Select features, and return to your map view, QGIS will have selected only those incidents that occurred in January, and highlighted them in yellow. We can now save this selection, and add it as a new layer, by right-clicking on the layer and exporting the selection.

<br> Save a selection and create a new layer


Save a selection and create a new layer


This brings up a window in which you must select where you’d like the data to be saved and what to call it, along with a number of other options, such as whether to save it with a projection, and whether to add it to your current map view. You will notice that the data format is an ESRI Shapefile, a popular format for storing vector data like points.

<br> Options when save a new layer


Options when save a new layer


Clipping

Point to polygon

Thematic map

Data resources

Further reading